Overview

Dataset statistics

Number of variables25
Number of observations58693
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory11.0 MiB
Average record size in memory196.0 B

Variable types

NUM12
BOOL9
CAT4

Reproduction

Analysis started2020-05-27 16:30:28.026002
Analysis finished2020-05-27 16:31:21.244848
Duration53.22 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Brand has 44055 (75.1%) zeros Zeros
Quantity has 44055 (75.1%) zeros Zeros
Last_Inc_Brand has 44133 (75.2%) zeros Zeros

Variables

ID
Real number (ℝ≥0)

Distinct count500
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean200000252.89748353
Minimum200000001
Maximum200000500
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum200000001
5-th percentile200000029
Q1200000128
median200000252
Q3200000378
95-th percentile200000476
Maximum200000500
Range499
Interquartile range (IQR)250

Descriptive statistics

Standard deviation144.3166768
Coefficient of variation (CV)7.215824717e-07
Kurtosis-1.206182483
Mean200000252.9
Median Absolute Deviation (MAD)125
Skewness-0.00945628919
Sum1.173861484e+13
Variance20827.30321
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2000001873580.6%
 
2000000413530.6%
 
2000002473470.6%
 
2000000971820.3%
 
2000002971790.3%
 
2000003931790.3%
 
2000003451780.3%
 
2000003991780.3%
 
2000000641750.3%
 
2000003511730.3%
 
Other values (490)5639196.1%
 
ValueCountFrequency (%) 
2000000011010.2%
 
200000002870.1%
 
200000003970.2%
 
200000004850.1%
 
2000000051110.2%
 
ValueCountFrequency (%) 
2000005001240.2%
 
2000004991060.2%
 
2000004981310.2%
 
2000004971200.2%
 
2000004961200.2%
 

Day
Real number (ℝ≥0)

Distinct count730
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean349.43107355221235
Minimum1
Maximum730
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum1
5-th percentile32
Q1161
median343
Q3530
95-th percentile689
Maximum730
Range729
Interquartile range (IQR)369

Descriptive statistics

Standard deviation212.0450583
Coefficient of variation (CV)0.6068294274
Kurtosis-1.216555375
Mean349.4310736
Median Absolute Deviation (MAD)184
Skewness0.09270947784
Sum20509158
Variance44963.10673
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3951790.3%
 
511730.3%
 
181680.3%
 
251650.3%
 
581610.3%
 
111610.3%
 
4491600.3%
 
701590.3%
 
1171580.3%
 
441570.3%
 
Other values (720)5705297.2%
 
ValueCountFrequency (%) 
11130.2%
 
229< 0.1%
 
3810.1%
 
41360.2%
 
5700.1%
 
ValueCountFrequency (%) 
730570.1%
 
729500.1%
 
728320.1%
 
7271020.2%
 
726990.2%
 

Incidence
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
44055
1
14638
ValueCountFrequency (%) 
04405575.1%
 
11463824.9%
 

Brand
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8443085206072274
Minimum0
Maximum5
Zeros44055
Zeros (%)75.1%
Memory size458.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.633083365
Coefficient of variation (CV)1.934225849
Kurtosis1.368604017
Mean0.8443085206
Median Absolute Deviation (MAD)0
Skewness1.714161805
Sum49555
Variance2.666961277
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04405575.1%
 
549788.5%
 
245427.7%
 
429275.0%
 
113502.3%
 
38411.4%
 
ValueCountFrequency (%) 
04405575.1%
 
113502.3%
 
245427.7%
 
38411.4%
 
429275.0%
 
ValueCountFrequency (%) 
549788.5%
 
429275.0%
 
38411.4%
 
245427.7%
 
113502.3%
 

Quantity
Real number (ℝ≥0)

ZEROS

Distinct count16
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.6919734891724737
Minimum0
Maximum15
Zeros44055
Zeros (%)75.1%
Memory size458.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile4
Maximum15
Range15
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.498734039
Coefficient of variation (CV)2.16588361
Kurtosis11.49524955
Mean0.6919734892
Median Absolute Deviation (MAD)0
Skewness2.944460607
Sum40614
Variance2.24620372
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04405575.1%
 
342417.2%
 
238006.5%
 
135426.0%
 
411652.0%
 
59211.6%
 
63550.6%
 
72000.3%
 
81330.2%
 
91060.2%
 
Other values (6)1750.3%
 
ValueCountFrequency (%) 
04405575.1%
 
135426.0%
 
238006.5%
 
342417.2%
 
411652.0%
 
ValueCountFrequency (%) 
152< 0.1%
 
145< 0.1%
 
1320< 0.1%
 
1229< 0.1%
 
11380.1%
 

Last_Inc_Brand
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8407987323871671
Minimum0
Maximum5
Zeros44133
Zeros (%)75.2%
Memory size458.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.63162799
Coefficient of variation (CV)1.940569041
Kurtosis1.391106065
Mean0.8407987324
Median Absolute Deviation (MAD)0
Skewness1.721146777
Sum49349
Variance2.662209898
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04413375.2%
 
549728.5%
 
244917.7%
 
429145.0%
 
113492.3%
 
38341.4%
 
ValueCountFrequency (%) 
04413375.2%
 
113492.3%
 
244917.7%
 
38341.4%
 
429145.0%
 
ValueCountFrequency (%) 
549728.5%
 
429145.0%
 
38341.4%
 
244917.7%
 
113492.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
44133
1
14560
ValueCountFrequency (%) 
04413375.2%
 
11456024.8%
 

Price_1
Real number (ℝ≥0)

Distinct count37
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.392074352989283
Minimum1.1
Maximum1.59
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum1.1
5-th percentile1.21
Q11.34
median1.39
Q31.47
95-th percentile1.5
Maximum1.59
Range0.49
Interquartile range (IQR)0.13

Descriptive statistics

Standard deviation0.09113873559
Coefficient of variation (CV)0.06546973256
Kurtosis-0.1785760158
Mean1.392074353
Median Absolute Deviation (MAD)0.07
Skewness-0.5556097282
Sum81705.02
Variance0.008306269126
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.47713112.1%
 
1.39609110.4%
 
1.35602710.3%
 
1.540917.0%
 
1.3333935.8%
 
1.3431865.4%
 
1.4830155.1%
 
1.4926644.5%
 
1.4624714.2%
 
1.3724214.1%
 
Other values (27)1820331.0%
 
ValueCountFrequency (%) 
1.11580.3%
 
1.143270.6%
 
1.171160.2%
 
1.1911021.9%
 
1.21350.2%
 
ValueCountFrequency (%) 
1.595681.0%
 
1.526781.2%
 
1.5113542.3%
 
1.540917.0%
 
1.4926644.5%
 

Price_2
Real number (ℝ≥0)

Distinct count30
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.7809989266181656
Minimum1.26
Maximum1.9
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum1.26
5-th percentile1.48
Q11.58
median1.88
Q31.89
95-th percentile1.9
Maximum1.9
Range0.64
Interquartile range (IQR)0.31

Descriptive statistics

Standard deviation0.1708676874
Coefficient of variation (CV)0.09593924221
Kurtosis0.6467319018
Mean1.780998927
Median Absolute Deviation (MAD)0.02
Skewness-1.382151978
Sum104532.17
Variance0.02919576659
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.892003334.1%
 
1.9766013.1%
 
1.5749218.4%
 
1.8833615.7%
 
1.8729975.1%
 
1.8526804.6%
 
1.5114112.4%
 
1.5812982.2%
 
1.5612192.1%
 
1.8611131.9%
 
Other values (20)1200020.4%
 
ValueCountFrequency (%) 
1.268751.5%
 
1.271200.2%
 
1.312000.3%
 
1.3510891.9%
 
1.365721.0%
 
ValueCountFrequency (%) 
1.9766013.1%
 
1.892003334.1%
 
1.8833615.7%
 
1.8729975.1%
 
1.8611131.9%
 

Price_3
Real number (ℝ≥0)

Distinct count21
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0067887141567136
Minimum1.87
Maximum2.14
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum1.87
5-th percentile1.93
Q11.97
median2.01
Q32.06
95-th percentile2.07
Maximum2.14
Range0.27
Interquartile range (IQR)0.09

Descriptive statistics

Standard deviation0.04686722504
Coefficient of variation (CV)0.02335433955
Kurtosis-0.2863220376
Mean2.006788714
Median Absolute Deviation (MAD)0.04
Skewness-0.05086428685
Sum117784.45
Variance0.002196536783
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.991105318.8%
 
2.02956316.3%
 
2.06814813.9%
 
2.0749628.5%
 
1.9745807.8%
 
2.0145127.7%
 
1.9538606.6%
 
223053.9%
 
1.9419193.3%
 
1.9116642.8%
 
Other values (11)612710.4%
 
ValueCountFrequency (%) 
1.871330.2%
 
1.895010.9%
 
1.9116642.8%
 
1.9310071.7%
 
1.9419193.3%
 
ValueCountFrequency (%) 
2.142710.5%
 
2.13520.1%
 
2.114580.8%
 
2.0912062.1%
 
2.0749628.5%
 

Price_4
Real number (ℝ≥0)

Distinct count26
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.1599453086398714
Minimum1.76
Maximum2.26
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum1.76
5-th percentile1.97
Q12.12
median2.17
Q32.24
95-th percentile2.26
Maximum2.26
Range0.5
Interquartile range (IQR)0.12

Descriptive statistics

Standard deviation0.08982459456
Coefficient of variation (CV)0.04158651342
Kurtosis1.41483285
Mean2.159945309
Median Absolute Deviation (MAD)0.05
Skewness-1.331337362
Sum126773.67
Variance0.008068457787
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.211089918.6%
 
2.241087218.5%
 
2.161083018.5%
 
2.0948918.3%
 
2.2646557.9%
 
2.1244817.6%
 
1.9721983.7%
 
2.1818693.2%
 
1.99871.7%
 
2.157361.3%
 
Other values (16)627510.7%
 
ValueCountFrequency (%) 
1.76770.1%
 
1.895631.0%
 
1.99871.7%
 
1.945781.0%
 
1.966021.0%
 
ValueCountFrequency (%) 
2.2646557.9%
 
2.241087218.5%
 
2.211089918.6%
 
2.24810.8%
 
2.195070.9%
 

Price_5
Real number (ℝ≥0)

Distinct count44
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.65479767604314
Minimum2.11
Maximum2.8
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum2.11
5-th percentile2.44
Q12.63
median2.67
Q32.7
95-th percentile2.79
Maximum2.8
Range0.69
Interquartile range (IQR)0.07

Descriptive statistics

Standard deviation0.09827182872
Coefficient of variation (CV)0.03701669231
Kurtosis3.907127941
Mean2.654797676
Median Absolute Deviation (MAD)0.03
Skewness-1.52909921
Sum155818.04
Variance0.009657352321
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.671179420.1%
 
2.755829.5%
 
2.6652078.9%
 
2.7943477.4%
 
2.6436156.2%
 
2.6924914.2%
 
2.6223344.0%
 
2.6322173.8%
 
2.7716982.9%
 
2.4916772.9%
 
Other values (34)1773130.2%
 
ValueCountFrequency (%) 
2.111140.2%
 
2.19800.1%
 
2.271740.3%
 
2.29560.1%
 
2.344610.8%
 
ValueCountFrequency (%) 
2.811572.0%
 
2.7943477.4%
 
2.788681.5%
 
2.7716982.9%
 
2.767381.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
38512
1
20181
ValueCountFrequency (%) 
03851265.6%
 
12018134.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
40169
1
18524
ValueCountFrequency (%) 
04016968.4%
 
11852431.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
56181
1
 
2512
ValueCountFrequency (%) 
05618195.7%
 
125124.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
51776
1
 
6917
ValueCountFrequency (%) 
05177688.2%
 
1691711.8%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
56588
1
 
2105
ValueCountFrequency (%) 
05658896.4%
 
121053.6%
 

Sex
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
36044
1
22649
ValueCountFrequency (%) 
03604461.4%
 
12264938.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
35620
1
23073
ValueCountFrequency (%) 
03562060.7%
 
12307339.3%
 

Age
Real number (ℝ≥0)

Distinct count56
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.79396180123695
Minimum18
Maximum75
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum18
5-th percentile24
Q130
median36
Q346
95-th percentile63
Maximum75
Range57
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.05244748
Coefficient of variation (CV)0.3106784385
Kurtosis-0.2225011979
Mean38.7939618
Median Absolute Deviation (MAD)8
Skewness0.7231417785
Sum2276934
Variance145.2614902
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3528594.9%
 
2728594.9%
 
3127594.7%
 
3224874.2%
 
2524364.2%
 
2624034.1%
 
4022653.9%
 
3721553.7%
 
3620113.4%
 
3319313.3%
 
Other values (46)3452858.8%
 
ValueCountFrequency (%) 
182350.4%
 
191060.2%
 
201960.3%
 
214670.8%
 
223190.5%
 
ValueCountFrequency (%) 
75720.1%
 
74940.2%
 
731210.2%
 
711010.2%
 
70920.2%
 

Education
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
1
37161
2
11716
0
8462
3
 
1354
ValueCountFrequency (%) 
13716163.3%
 
21171620.0%
 
0846214.4%
 
313542.3%
 

Length

Max length1
Median length1
Mean length1
Min length1

Income
Real number (ℝ≥0)

Distinct count499
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean121841.64431874329
Minimum38247
Maximum309364
Zeros0
Zeros (%)0.0%
Memory size458.5 KiB

Quantile statistics

Minimum38247
5-th percentile68834
Q195541
median117971
Q3138525
95-th percentile194882
Maximum309364
Range271117
Interquartile range (IQR)42984

Descriptive statistics

Standard deviation40643.74068
Coefficient of variation (CV)0.3335783993
Kurtosis3.868321723
Mean121841.6443
Median Absolute Deviation (MAD)21506
Skewness1.424871282
Sum7151251630
Variance1651913656
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1245973580.6%
 
1062053530.6%
 
1581933470.6%
 
694872280.4%
 
1352751820.3%
 
1230031790.3%
 
1130121790.3%
 
1476261780.3%
 
954381780.3%
 
812001750.3%
 
Other values (489)5633696.0%
 
ValueCountFrequency (%) 
382471140.2%
 
436841290.2%
 
43805980.2%
 
53608930.2%
 
574801180.2%
 
ValueCountFrequency (%) 
3093641050.2%
 
3085291090.2%
 
3084911250.2%
 
2819231350.2%
 
2810841230.2%
 

Occupation
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
1
29882
0
21032
2
7779
ValueCountFrequency (%) 
12988250.9%
 
02103235.8%
 
2777913.3%
 

Length

Max length1
Median length1
Mean length1
Min length1

Settlement size
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size458.5 KiB
0
32081
1
14727
2
11885
ValueCountFrequency (%) 
03208154.7%
 
11472725.1%
 
21188520.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

Segment
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size229.3 KiB
1
21401
3
13580
2
12217
0
11495
ValueCountFrequency (%) 
12140136.5%
 
31358023.1%
 
21221720.8%
 
01149519.6%
 

Length

Max length1
Median length1
Mean length1
Min length1

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

IDDayIncidenceBrandQuantityLast_Inc_BrandLast_Inc_QuantityPrice_1Price_2Price_3Price_4Price_5Promotion_1Promotion_2Promotion_3Promotion_4Promotion_5SexMarital statusAgeEducationIncomeOccupationSettlement sizeSegment
02000000011000001.591.872.012.092.660100000471110866101
120000000111000001.511.891.992.092.660000000471110866101
220000000112000001.511.891.992.092.660000000471110866101
320000000116000001.521.891.982.092.660000000471110866101
420000000118000001.521.891.992.092.660000000471110866101
520000000123000001.501.901.992.092.660000000471110866101
620000000128122001.501.901.992.092.670000000471110866101
720000000137000211.501.901.992.092.670000000471110866101
820000000141000001.351.581.972.092.671110000471110866101
920000000143000001.351.581.972.092.671110000471110866101

Last rows

IDDayIncidenceBrandQuantityLast_Inc_BrandLast_Inc_QuantityPrice_1Price_2Price_3Price_4Price_5Promotion_1Promotion_2Promotion_3Promotion_4Promotion_5SexMarital statusAgeEducationIncomeOccupationSettlement sizeSegment
58683200000500681000001.421.852.062.242.771100000421120946101
58684200000500689000001.501.872.062.242.780000000421120946101
58685200000500693000001.421.512.022.242.770100000421120946101
58686200000500694000001.421.512.022.242.770100000421120946101
58687200000500697126001.421.511.972.242.780000000421120946101
58688200000500703000211.411.852.012.242.790010000421120946101
58689200000500710000001.361.842.092.242.770000000421120946101
58690200000500717000001.501.802.142.242.750000000421120946101
58691200000500722123001.511.822.092.242.800000000421120946101
58692200000500726000211.511.822.092.242.800000000421120946101